Catalogo dei prodotti della ricerca

Can we tell the author of a message, without reading the message? This work tackles authorship analysis through features that ignore the explicit content of a contribution - informally, those that can be computed even if every character in the body of a message (but not metadata such as timing or \likes") is replaced by an X. Focusing on forum posts, we distil a case-study set of these content-agnostic features, and prove its viability for authorship verification and attribution, using data from four online forums (of different size, language, and topic). A simple classification testbed, relying exclusively on content-agnostic features, confirms the author of a message with 76% accuracy, and discriminates between two candidate authors with 94% accuracy. Being able to re-identify a user without looking at the content of her contributions poses a serious threat to common data anonymization practices.

Content attribution ignoring content / Samory, M.; Peserico, E.. - (2016), pp. 233-243. ( 8th ACM Web Science Conference, WebSci 2016 Hannover, DE ) [10.1145/2908131.2908156].

Content attribution ignoring content

Samory M.;Peserico E.

2016

Abstract

Can we tell the author of a message, without reading the message? This work tackles authorship analysis through features that ignore the explicit content of a contribution - informally, those that can be computed even if every character in the body of a message (but not metadata such as timing or \likes") is replaced by an X. Focusing on forum posts, we distil a case-study set of these content-agnostic features, and prove its viability for authorship verification and attribution, using data from four online forums (of different size, language, and topic). A simple classification testbed, relying exclusively on content-agnostic features, confirms the author of a message with 76% accuracy, and discriminates between two candidate authors with 94% accuracy. Being able to re-identify a user without looking at the content of her contributions poses a serious threat to common data anonymization practices.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2016
			
	Nome convegno
	
				8th ACM Web Science Conference, WebSci 2016
			
	Parole chiave
	
				Attribution; Authorship; Forum; Identification; Privacy; Social; Structural features; Timing
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Content attribution ignoring content / Samory, M.; Peserico, E.. - (2016), pp. 233-243. ( 8th ACM Web Science Conference, WebSci 2016 Hannover, DE ) [10.1145/2908131.2908156].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1655758

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

2

social impact